Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster Categories
Poster Schedule
Preparing your Poster - Information and Poster Size
How to mount your poster
Print your poster in Basel

View Posters By Category

Session A: (July 22 and July 23)
Session B: (July 24 and July 25)

Presentation Schedule for July 22, 6:00 pm – 8:00 pm

Presentation Schedule for July 23, 6:00 pm – 8:00 pm

Presentation Schedule for July 24, 6:00 pm – 8:00 pm

Session A Poster Set-up and Dismantle
Session A Posters set up: Monday, July 22 between 7:30 am - 10:00 am
Session A Posters should be removed at 8:00 pm, Tuesday, July 23.

Session B Poster Set-up and Dismantle
Session B Posters set up: Wednesday, July 24 between 7:30 am - 10:00 am
Session B Posters should be removed at 2:00 pm, Thursday, July 25.

K-01: Non-synonymous to synonymous substitutions suggest that orthologs tend to keep their functions, while paralogs are a source of functional novelty
COSI: EvoCompGen COSI
  • Gabriel Moreno-Hagelsieb, Wilfrid Laurier University, Canada
  • Mario Esposito, Wilfrid Laurier University, Canada

Short Abstract: Because orthologs diverge after speciation events, and paralogs after gene duplication, it is expected that orthologs should tend to keep their functions, while paralogs have been proposed as a source of new functions. This does not mean that paralogs should diverge much more than orthologs, but it certainly means that, if there is a difference, then orthologs should be more functionally stable. Since protein functional divergence follows from non-synonymous substitutions, here we present an analysis based on the ratio of non-synonymous to synonymous substitutions (dN/dS). The results showed orthologs to have noticeable and statistically significant lower values of dN/dS than paralogs, not only confirming that orthologs keep their functions better, but also suggesting that paralogs are a readily source of functional novelty.

K-02: Using the evolutionary model of the secondary structure prediction space to align the amino acid sequences
COSI: EvoCompGen COSI
  • Burkhard Rost, Technical University of Munich, Germany
  • Jhih-Siang Lai, The University of Queensland, Australia
  • Mikael Boden, The University of Queensland, Australia
  • Bostjan Kobe, The University of Queensland, Australia

Short Abstract: The evolutionary model of the prediction space that characterises both amino acid and secondary structure state has not been well-studied. Pairwise alignment is important because it provides the initial information for the multiple alignment and advanced techniques. In this work, we built the 60-state evolutionary model of the prediction space to show it is capable of conducting the pairwise alignment accurately by its related odds scoring matrix. We collected 147,667 protein secondary structure predictions from the PredictProtein to build the model. We compiled a dataset of 4,471 pairwise structure superpositions of remote proteins, in which the sequence-structure information is from the Pfam and only high-resolution structures with good RMSD are considered. We implemented Needleman-Wunsch’s global alignment method and affine gap penalties with the 60-state scoring matrix and compared the alignments with those derived by BLOSUM matrices. We used the Modeller to build the 3D models based on the alignments. The pairwise alignments derived by the 60-state scoring matrices correlate better with superpositions by the evolutionary distances of both amino acid (0.60) and DSSP (0.55); and the coverage of the structure superposition reaches 0.63. The alignment derived by 60-state matrix also improves predicted 3D models in terms of the QMEANS score.

K-03: The Morphometric and Genetic Similarities/Difference that Contribute to Facial Morphology among the Three Major Ethnic Groups in Nigeria
COSI: EvoCompGen COSI
  • Abiodun Olowo, UNIVERSITY OF IBADAN, Nigeria

Short Abstract: Genetic studies, most especially of monozygous twins on European samples have shown that facial morphology is under a strong genetic influence but the extent of this heritability on a scale as wide as ethnicity is yet to be determined. Various GWAS studies have revealed some face shaping genes/SNPs in divers population but this is yet to determined for West Africa. In this study, we collected 3D phenotypes and saliva DNA from representatives of three major ethnic groups in Nigeria (a West African country) for the purpose of mapping put the face shaping genes/SNPs and the similarities or differences that exist between these populations. We performed geometric mophometrics on the facial pictures and SNP/Illumina genotyping on the Saliva DNA. The study confirms the genetic diversity of West Africa and shows the need to corroborate the popular YRI genetic data that typically describes Nigerian genomics. Through this study, we hope to develop a model for reconstruction of facial features from Nigerian genetic samples after the order of Claes et.al (2018).

K-04: Evolutionary Dynamics and Function of Hydra Transposons
COSI: EvoCompGen COSI
  • Wai Yee Wong, University of Vienna, Austria

Short Abstract: More than half of the H. vulgaris genome is composed of transposable elements (TEs). While TEs are thought to be a major player in facilitating adaptation, the role of the reported TE expansions in hydra remains unclear. In this study, we aimed to examine the evolutionary history of TEs and their function in hydra by a comparative genomic approach. To improve on the fragmented annotation during TEs classification, we developed a novel meta-pipeline, RepeatCraft, which defragments closely spaced repeat loci in the genomes. After applying the pipeline, the total number of repeat loci in H. vulgaris was reduced by about 10%. We also performed a comparative analysis using the transcriptome and genome data from a green hydra and four brown hydras, in reference to the H. vulgaris genome. The result reveals a significant expansion of a single LINE family specific to brown hydras, compared with the much less abundant LINE elements in the green hydra. We propose that the expansion of this LINE family led to the significant genome size increase in brown hydras. We are studying the role of this TE expansion in hydrozoan evolution and unravelling changes in genomic architecture associated with it.

K-05: Mapping global and local coevolution across 600 species to identify novel homologous recombination repair genes
COSI: EvoCompGen COSI
  • Dana Sherill-Rofe, Hebrew University, Israel
  • Dolev Rahat, Hebrew University, Israel
  • Aviad Zick, Hadassah Medical Center, Israel
  • Yuval Tabach, The Hebrew University of Jerusalem, Israel

Short Abstract: Mutations in homologous recombination repair (HRR) genes can result in increased mutation rate and genomic rearrangements and are associated with numerous genetic disorders and cancer. Despite intensive research, the HRR pathway is not yet fully mapped. Phylogenetic profiling analysis, which detects functional linkage between genes using coevolution, is a powerful approach to identify factors in many pathways. Nevertheless, phylogenetic profiling has limited predictive power when analyzing pathways with complex evolutionary dynamics such as the HRR. To map novel HRR genes systematically, we developed a novel algorithm which detects local coevolution across hundreds of genomes and points to the evolutionary scale (e.g., mammals, vertebrates, animals, plants) at which coevolution occurred. By using this algorithm, we identified dozens of unrecognized genes that coevolved with the HRR pathway, either globally across all eukaryotes or locally in different clades. We validated eight genes in functional biological assays to have a role in DNA repair. These genes might lead to a better understanding of missing heredity in HRR-associated cancers (e.g., heredity breast and ovarian cancer). Our platform presents an innovative approach to predict gene function, identify novel factors related to different diseases and pathways, and characterize gene evolution.

K-06: The Ortholog Conjecture Revisited: the Value of Orthologs and Paralogs in Function Prediction
COSI: EvoCompGen COSI
  • Moses Stamboulian, Indiana University Bloomington, United States
  • Rafael Guerrero, Indiana University Bloomington, United States
  • Matthew Hahn, Indiana University Bloomington, United States
  • Predrag Radivojac, Northeastern University, United States

Short Abstract: Computational prediction of gene function is a key step in making full use of newly sequenced genomes. Function is generally predicted by transferring annotations from homologous genes for which experimental evidence exists. The “ortholog conjecture” proposes that orthologous genes should be preferred when making such predictions, as they evolve functions more slowly than paralogous genes. Previous research has provided little support for the ortholog conjecture, though the incomplete nature of the data used cast doubt on the conclusions. Here we use experimental annotations from over 40,000 proteins (drawn from over 80,000 publications) to revisit the ortholog conjecture in two pairs of species: human and mouse and Saccharomyces cerevisiae and Schizosaccharomyces pombe. By distinguishing between the evolution of function and the prediction of function, we find strong evidence against the ortholog conjecture in the context of function prediction. Furthermore, we quantify the amount of data that is ignored when paralogs are discarded, alognside the resulting loss in prediction accuracy in both species pairs. Our results support the view that the types of homologs used are largely irrelevant to the task of function prediction. We should instead aim to maximize the amount of data we use for this task, regardless of homology.

K-07: SonicParanoid: fast, accurate and easy orthology inference
COSI: EvoCompGen COSI
  • Salvatore Cosentino, The University of Tokyo, Japan
  • Wataru Iwasaki, The University of Tokyo, Japan

Short Abstract: Orthology inference constitutes a common base of many genome-based studies, as a pre-requisite for annotating new genomes, finding target genes for biotechnological applications and revealing the evolutionary history of life. Although its importance keeps rising with the ever-growing number of sequenced genomes, existing tools are computationally demanding and difficult to employ. Here, we present SonicParanoid, which is orders of magnitude faster than, but comparably accurate to, the well-established tools with a balanced precision-recall trade-off. Furthermore, SonicParanoid substantially relieves the difficulties of orthology inference for those who need to construct and maintain their own genomic datasets. SonicParanoid is available with a GNU GPLv3 license on the Python Package Index and BitBucket. Documentation is available at http://iwasakilab.bs.s.u-tokyo.ac.jp/sonicparanoid.

K-08: Sampling uncertainty for virus phylogeography
COSI: EvoCompGen COSI
  • Matthew Scotch, Arizona State University, United States
  • Tasnia Tahsin, Arizona State University, United States
  • Davy Weissenbacher, University of Pennsylvania, United States
  • Karen O'Connor, University of Pennsylvania, United States
  • Arjun Magge, Arizona State University, United States
  • Matteo Vaiente, Arizona State University, United States
  • Marc A. Suchard, University of California, Los Angeles, United States
  • Gonzalez-Hernandez Graciela, University of Pennsylvania, United States

Short Abstract: Discrete phylogeography using software like BEAST considers the sampling location of each taxon as fixed to a single location without uncertainty. We relaxed this assumption and allowed for analytic integration of uncertainty for virus phylogeography. We considered two influenza case studies, H5N1 and pdm09, and implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty. We also included two scenarios that eliminated uncertainty via heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) with known locations. We studied posterior metrics including: persistence, migration rates, trunk rewards, and root state posterior probability. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005-0.047 for scenarios with sampling uncertainty versus 0.063-0.075 for CNTR and POP. When considering the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. We found that assigning geospatial uncertainty improves virus phylogeography as compared to ad-hoc heuristics.

K-09: Transcriptomic analysis of the developmental stages of tardigrades reveals piRNA and hormone regulation
COSI: EvoCompGen COSI
  • Yuki Yoshida, Keio University, Japan
  • Kazuharu Arakawa, Keio University, Japan

Short Abstract: Tardigrades are microscopic organisms of which terrestrial species are capable of tolerating complete desiccation, a phenomenon known as anhydrobiosis. Tardigrades are especially unique as an anhydrobiote that they can enter the anhydrobiotic state at any point of their life stages including during embryonic development, suggesting possible plasticity in this stage. To this end, we screened for coordinated gene expression during development of two species of tardigrades, Hypsibius exemplaris and Ramazzottius varieornatus, and observed by transcriptome sequencing that the 3rd day of the embryonic development is critical only in H. exemplaris, a more desiccation-sensitive of the two. Piwi induction implied the existence of piRNAs, of which we predicted putative piRNA clusters within the H. exemplaris genome with proTRAC and PILFER. Exposure of H. exemplaris embryos to juvenile hormones affected egg hatching but not the embryonic development. These observations suggest a critical phase in H. exemplaris embryonic development, of which inhibitors affect the regulation of hatching.

K-10: AmoCoala: Towards a more realistic model for cophylogeny reconstruction via an approximate Bayesian computation
COSI: EvoCompGen COSI
  • Blerina Sinaimeri, INRIA, France
  • Laura Urbini, INRIA, France
  • Catherine Matias, CNRS, France
  • Marie-France Sagot, Inria, Université Claude Bernard Lyon 1, France

Short Abstract: Nowadays, the most used model in studies of the coevolution of hosts and symbionts is phylogenetic tree reconciliation. A crucial issue in this model is that from a biological point of view, reasonable cost values for an event-based reconciliation are not easily chosen. Different methods have been developed to infer the set of costs to be used for a given pair of host and symbiont trees. However, a major limitation of these methods is their inability to model the ``invasion'' of different host species by a same symbiont species, which is often observed in reality. Here we propose a method, called AmoCoala, that for a given pair of host and symbiont trees, estimates the frequency of the cophylogeny events, in presence of invasion events, based on an approximate Bayesian computation (ABC) approach that may be more efficient than a classical likelihood method. The algorithm we propose on one hand provides more confidence in the set of costs to be used for a given pair of host and parasite trees, while on the other hand it allows to estimate the frequency of the events in cases of big datasets. We evaluated our method in synthetic and real datasets.

K-11: Relationship between numbers of interaction and synonymous substitution for proteins expressed in E.coli
COSI: EvoCompGen COSI
  • Serika Taga, Meiji University, Japan
  • Nobuyuki Uchikoga, Meiji University, Japan
  • Takanori Sasaki, Meiji University, Japan

Short Abstract: Recently, gene sequences and protein-protein interaction networks for various species have been stored in databases. On the gene sequences, although it has been found that the rate of non-synonymous substitution is related to the functional importance of each protein, relationship between the rate of synonymous substitution and the protein function has not been clear. In this study, we investigated the relationship between the numbers of interactions (degree) of each protein in the protein interaction network and of the synonymous substitutions for E.coli genes. In practice, we performed multiple sequence alignment for 858 genes among 39 E.coli strains, and calculated the number of synonymous substitutions per amino acid site. Next, the number of synonymous substitutions obtained for each protein was plotted against the degree. In theory, the average number of synonymous codons per amino acid site for each protein was constant independently of protein degree. However, the actual number of synonymous substitutions had a tendency to be decreased with increasing the degree of the protein interaction in the range of 1 to 24. These results suggest that the number of interactions which each protein has affects to the synonymous substitution frequencies of that.

K-12: Rearrangement Scenarios Guided by Chromatin Structure
COSI: EvoCompGen COSI
  • Pijus Simonaitis, LIRMM, Université Montpellier, France
  • Krister Swenson, CNRS, Université de Montpellier, France

Short Abstract: Rearrangements of blocks of gene-coding DNA are responsible for diversity on many scales. They can trigger the advent of phenotypic changes, and are involved in devastating genomic disorders. There is a quarter century of theoretical and algorithmic work devoted to finding and sampling scenarios of rearrangements that could have transformed the gene order of one species into the gene order of another. Nonetheless there is still a lack of methodology for the inference of scenarios which conform to some biological constraints. We have defined a framework for cost-constrained rearrangements and devised algorithms for finding optimal scenarios within this framework. Our work is motivated by a couple of hypotheses. First, that the sequences undergoing rearrangement need to be in close spatial proximity in the nucleus to become joined. And second, that genome’s spatial organization is somewhat conserved across evolutionary distances. We use Hi-C data to infer the evolutionary scenarios maximizing the co-locality of the breakpoints. This enables us to study our hypotheses in detail and preliminary results concerning Drosophila species are in line with them. Our framework is liberal and can be used with data concerning active/repressive epigenetic marks, intergenic lengths, repetitive elements or other biological information relevant to rearrangements.

K-13: IMAP: Chromosome-level genome assembler combining multiple de novo assemblies
COSI: EvoCompGen COSI
  • Giltae Song, Pusan National University, South Korea
  • Juyeon Kim, Konkuk University, South Korea
  • Seokwoo Kang, Pusan National University, South Korea
  • Hoyong Lee, Pusan National University, South Korea
  • Daehong Kwon, Konkuk University, South Korea
  • Daehwan Lee, Konkuk University, South Korea
  • Gregory Lang, Lehigh University, United States
  • J. Michael Cherry, Stanford University, United States

Short Abstract: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach to establish the chromosome-level genome assembly using unique features of sequencing data in experimental evolution studies. In this study, we developed a new software pipeline, integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by combining multiple initial assemblies from only short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of several fungal strains, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. Our pipeline combines the strengths of reference-guided and meta-assembly approaches.

K-14: Annotation of pseudogenes and investigation of their evolutionary origin and formation mechanisms in trypanosomatids pathogenic to humans
COSI: EvoCompGen COSI
  • Mayla Abrahim, Fundação Oswaldo Cruz (FIOCRUZ), Brazil
  • Fernando Alvarez-Valín, Seccion Biomatematica, Universidad de la República del Uruguay, Uruguay
  • Luisa Berna, Intitut Pasteur de Montevideo, Uruguay
  • Luiza Pereira, Fundação Oswaldo Cruz (FIOCRUZ), Brazil
  • Patrícia Cuervo, Fundação Oswaldo Cruz (FIOCRUZ), Brazil
  • Claudia Avila-Levy, Fundação Oswaldo Cruz (FIOCRUZ), Brazil
  • Antonio Basílio, Fundação Oswaldo Cruz (FIOCRUZ), Brazil
  • Marcos Catanho, Fundação Oswaldo Cruz (FIOCRUZ), Brazil

Short Abstract: BACKGROUND: Genome sequencing and transcriptome analyses of protozoans of the family Trypanosomatidae are important to study the evolution of these organisms, offering new opportunities for better understanding of relevant biological aspects, such as pseudogenization. Pseudogenes provide a powerful tool to record the evolution of genomes, and experimental evidence suggests that some of these molecular relics are biologically active in process of regulation of gene expression in several distinct lineages. METHODS: Genome sequences of human infective trypanosomatids were obtained from public databases. Pseudogenes were identified by sequence similarity between intergenic regions of the analyzed genomes measured against a dataset of reference protein sequences, as well as based on the presence of degeneration signals typically associated with the loss of function in these genomic segments. RESULTS: A total of 51.995.092 sequences were recognized as putative pseudogenes in the selected group of parasites in this work. CONCLUSION: Since this is an ongoing project, we intend to contribute for a better understanding of the evolutionary origin and mechanisms of pseudogene formation in these organisms, as well as to investigate their potential involvement in the regulation of gene expression, i.e.in mechanisms of post-transcriptional regulation in trypanosomatids, a crucial phenome still mostly unknown in these organisms.

K-15: Spatial structure governs the mode of tumour evolution
COSI: EvoCompGen COSI
  • Robert Noble, ETH Zurich, Switzerland
  • Dominik Burri, University of Basel, Switzerland
  • Jakob Kalther, University Hospital RWTH Aachen, Germany
  • Niko Beerenwinkel, ETH Zurich, Switzerland

Short Abstract: Characterizing the mode – the way, manner, or pattern – of evolution in tumours is important for clinical forecasting and optimizing cancer treatment. DNA sequencing studies have inferred various modes, including branching, punctuated and neutral evolution, but it is unclear why a particular pattern predominates in any given tumour. Here we propose that differences in tumour architecture alone can explain the variety of observed patterns. We examine this hypothesis using spatially explicit population genetic models and demonstrate that, within biologically relevant parameter ranges, human tumours are expected to exhibit four distinct onco-evolutionary modes (oncoevotypes): rapid clonal expansion (predicted in leukaemia); progressive diversification (in colorectal adenomas and early-stage colorectal carcinomas); branching evolution (in invasive glandular tumours); and effectively almost neutral evolution (in certain non-glandular and poorly differentiated solid tumours). We thus provide a simple, mechanistic explanation for a wide range of empirical observations. Oncoevotypes are governed by modes of cell dispersal and cell-cell interactions, which we show are essential factors in accurately characterizing, forecasting and controlling tumour evolution.

K-16: Efficient homolog identification and interactive web-based whole genome alignment
COSI: EvoCompGen COSI
  • Daniel Tello, Universidad de Los Andes, Colombia
  • Rogelio Garcia, Universidad de los Andes, Colombia
  • Camilo Escobar-Velásquez, Universidad de los Andes, Colombia
  • Mario Linares-Vásquez, Universidad de los Andes, Colombia
  • Jorge Duitama, Universidad de los Andes, Colombia

Short Abstract: Recent developments on long read high throughput sequencing technologies have enabled high quality genome assemblies for an unprecedented number of species. These genomes represent unique data resources to elucidate complex patterns of evolutionary events through comparative genomics. A basic operation in comparative genomics is the alignment of complete genomes. Although genome alignment is a classical problem in bioinformatics, recent developments on data structures, algorithms and technologies create opportunities to develop novel bioinformatic tools for this problem. Here we present our software solution for whole genome alignment through efficient identification of synteny blocks built from large chains of orthologous genes. The algorithm performs k-mer searches on FM-indexes built from the proteomes of annotated genomes to efficiently identify paralogs and orthologs. Benchmark experiments against commonly used tools for ortholog identification and synteny analysis show that construction of ortholog chains enables alignments between chromosomes of large genomes within minutes of computation. Using state-of-the-art data visualization technologies, we provide novel interactive views of the alignments provided by our software. Our genomes aligner is already available as open source software. We expect that this development represents a significant contribution to the field of comparative genomics, facilitating further discoveries in evolution, functional genomics and related fields.

K-17: Reconstructing the phylogeny of Corynebacteriales while accounting for Horizontal Gene Transfer
COSI: EvoCompGen COSI
  • Nilson Da Rocha Coimbra, Universidade Federal de Minas Gerais, Brazil
  • Aristóteles Góes-Neto, Universidade Federal de Minas Gerais, Brazil
  • Vasco Azevedo, Universidade Federal de Minas gerais, Brazil
  • Aida Ouangraoua, Université de Sherbrooke, Canada

Short Abstract: Classically, the reconstruction of bacterial phylogenies and identification of new species using in silico approach concatenates the small subunit of ribosomal RNA (16S rRNA) and single-copy genes into multiple sequence alignment. However, the genomic content of extant microorganisms is affected by Horizontal Gene Transfer whereas the reconstruction of microbial phylogenies using the classical sequence-based approach does not account for the presence of transferred genes. Here, we improved the methods of microbial phylogeny reconstruction while accounting for the presence of HGT genes. We presented a new gene tree-based method to correct putative transferred genes and applied it in the phylogeny reconstruction of the Order Corynebacteriales, the largest clade in the Phylum Actinobacteria. We collected 360 Corynebacteriales genome accesses from NCBI RefSeq Database and reconstructed 17 phylogenies using gene-tree based and sequence-based methods. The evaluation and the selection of a confidence set of candidate trees were selected using the approximately unbiased test of phylogenetic trees, and the Robinson Folds topology test. Phylogeny of Corynebacteriales were sketched based on the support of the species-level conservation. A total of 16 out of 17 phylogenies exhibited highly resolved monophyletic groups, discriminating slow-growers from fast-growers in Mycobacteriaceae family, and even the biovar speciation in Corynebacterium pseudotuberculosis.

K-18: Profiling an endophytic fungal genomic population structure and its relationships with host variation, geographical origin and ploidy
COSI: EvoCompGen COSI
  • Qianhe Liu, AgResearch, New Zealand
  • Linda Johnson, AgResearch, New Zealand
  • Minen Su, AgResearch, New Zealand
  • Anna Larking, AgResearch, New Zealand
  • Richard Johnson, AgResearch, New Zealand
  • Ruy Jauregui, AgResearch, Grasslands Research Centre, New Zealand
  • Paul Maclean, AgResearch, Grasslands Research Centre, New Zealand

Short Abstract: Germplasm screening has provided independent cultures of endophytic fungi, which reveal no variation when studied via ITS amplicon sequencing. The use of a reference free Genotype By Sequencing (GBS) method allows genetic variation to be resolved, in accordance with broad host relationship parameters. The variant call algorithms used in this case, designed for diploid species, reveal a single outlier which appears to be the product of a hybridisation event. Further research into the genetic variation of these endophytes using a genomic reference to map GBS tags will provide a higher resolution picture and will allow us to further refine the genetic variation within a fungal species as well as inspect the ploidy of outlier samples

K-19: Exploring genomic diversity in a haploid population using colored de Bruijn graphs: A case study on human mitochondrial genomes
COSI: EvoCompGen COSI
  • Jindan Guo, Beijing Normal University, China
  • Xia Han, Beijing Normal University, China
  • Kui Lin, Beijing Normal University, China

Short Abstract: A pan-genome ordering is important for characterizing genome–wide homologous relationships between a set of population genomes. As one type of genome graphs, a Colored de Bruijn Graph (CdBG) includes linkage information and its sub-structures called superbubbles each could be used to model complex local homologous relationships and to identify potential haplotypes. Here, we use the VARI software that can construct a succinct CdBG structure and apply it to a set of human mitochondrial genome for comprehensively exploring the kinship relationships. Before creating the CdBG, we first select an optimal kmer to guarantee that each genome within it is acyclic. Then, we classify all the nodes and decompose the CdBG into a list of nesting superbubbles. Therefore, based on the first-class superbubbles we identified, a pan-genome ordering could be established. Thus, we traverse the nodes simultaneously using DFS and BFS and compute various branch-based indices for each superbubble. Having this we can define the distance matrix and infer the local kinship relationships between the genomes. In future, we will infer and clarify the genome-wide kinship relationships between the set of population genomes by integrating all the information of each first-class superbubbles in light of the pan-genome ordering system.

K-20: Assessment of the evolution of RETT syndrome-related proteins disordered regions and the implications of their point mutations
COSI: EvoCompGen COSI
  • Muhamad Fahmi, Ritsumeikan University, Japan
  • Gen Yasui, Ritsumeikan University, Japan
  • Yukihiko Kubota, Ritsumeikan University, Japan
  • Masahiro Ito, Ritsumeikan University, Japan

Short Abstract: Rett syndrome (RTT) is a progressive postnatal neurodevelopmental disorder affecting brain development and function during early childhood and is caused by mutation of either the MECP2, CDKL5, or FOXG1 gene that encodes a protein with disordered regions which commonly have fast rates of evolution. To understand the association between the pathogenicity of the point-mutated RTT proteins and the evolution of the disordered regions, we predicted the order-disorder propensity and post-translational modifications (PTMs) and analyzed the rates of evolution per site of human MECP2, CDKL5, and FOXG1 and compared it to their orthologous proteins. We also assessed these analyses using the database of RTT point mutations. Additionally, we evaluated the related PTMs and the function in their downstream target genes that were classified based on phylogenetic profiling. Our results showed that despite having fast rates of evolution, the structures of disordered regions of these three proteins were highly constrained. These regions are important to display the PTMs sites and hence, their aberration could impair the interaction of these proteins with their downstream targets. This affects the epigenetic regulation of gene expression, and thus causes RTT.

K-21: Phylogenetic profile analysis of Rett syndrome-related proteins
COSI: EvoCompGen COSI
  • Muhamad Fahmi, Ritsumeikan University, Japan
  • Gen Yasui, Ritsumeikan University, Japan
  • Yukihiko Kubota, Ritsumeikan University, Japan
  • Masahiro Ito, Ritsumeikan University, Japan
  • Takahiro Nakamura, Ritsumeikan University, Japan

Short Abstract: Rett syndrome (RTT) is a neurodevelopmental disorder of the cranial nervous system that predominantly affects females and is associated with mutations in MeCP2, CDKL5, and FOXG1. Mutations in these genes are known to cause both intellectual disabilities and epilepsy. The mechanism of RTT onset remains unknown and no fundamental treatment for RTT has been established. The current study utilized phylogenetic profiling to identify the network of proteins that interact with MeCP2, CDKL5, and FOXG1 and infer their evolutionary histories. MeCP2, CDKL5, and FOXG1 were found to interact with 148, 18, and 75 proteins, respectively. These human RTT related proteins were classified into four classes: conserved in chordates (class 1), conserved in metazoans (class 2), conserved in plantae (class 3), and widely conserved in eukaryotes (class 4). Class 1 proteins interacting with MeCP2 were determined to be involved in epigenetic regulation of gene expression, and the result suggests to effect on evolutionary process in chordates. Class 2 proteins interacting with MeCP2 were found to be involved in transcriptional regulation, and are responsible for basal functions in multicellular organisms, such as construction of complex systems. Determination of the functions of RTT related proteins provides insight into the mechanism of RTT onset.

K-22: Identification and analysis of human unitary pseudogenes - a story of gene loss and phenotype diversification
COSI: EvoCompGen COSI
  • Cristina Sisu, Brunel University, United Kingdom
  • Edward Cannon, Brunel University, United Kingdom

Short Abstract: Unitary pseudogenes are a unique class of pseudogenes formed when functional genes acquire disabling mutations that result in the inactivation of the original coding loci. The lack of a functional homolog in the same organism makes identification and characterisation of unitaries challenging. Moreover, the correct annotation of unitary pseudogenes would provide essential information on functional losses in the organism evolution. Taking advantage of the large number of annotated eukaryotic genomes currently available, we developed a stand-alone pipeline to detect unitary pseudogenes. Here, we analysed gene loss in human lineage spanning half a million years of evolution from worm to primates using as starting point unique protein coding genes with no human orthologues. Using a binary comparison between each organism and human we are able to recover the previously annotated pseudogenes and 792 new unitaries. The overall unitary pseudogene repertoire shows an organisms specific evolution and follows previously seen trends in terms of family composition with large number of unitaries (Zinc finger, membrane proteins, etc.). However, we are also able to distinguish individual loss or gain of function events that characterise organism specific phenotypes. Surprisingly we find a larger number of unitaries with respect to other primates compared to non-primates.

K-23: Genome mining for metabolic gene clusters in yeasts
COSI: EvoCompGen COSI
  • Chris Pyatt, Quadram Institute Bioscience, United Kingdom
  • Adam Elliston, Earlham Institute, United Kingdom
  • Ian Roberts, Quadram Institute Bioscience, United Kingdom
  • Jo Dicks, Quadram Institute, United Kingdom
  • Steve James, Quadram Institute, United Kingdom

Short Abstract: Secondary metabolites from a variety of organisms, including bacteria, plants, and fungi, are used in industrial processes, in food production, and in healthcare. They are frequently produced by metabolic pathways that are spatially clustered on the genome and expressed only under certain conditions. Finding new gene clusters via computational genome mining is important in the search for new metabolites. The genomes of 880 taxonomically diverse strains from the UK National Collection of Yeast Cultures (NCYC; http://www.ncyc.co.uk) have been sequenced and offer a unique resource for genome mining. A gene cluster discovery pipeline has been implemented in Python, making use of existing bioinformatics tools. We present a case study of 70 Rhodotorula genomes searched for a known carotenoid-producing gene cluster and, using a combination of established tools and ad hoc methods, for novel clusters. Several clusters are predicted in each species, showing potential for further exploitation. The pipeline aims to fill gaps in gene cluster discovery left by current genome mining tools. This ongoing study reveals marked taxonomic peaks in gene cluster content and secondary metabolite production in specific areas of the yeast phylogeny. Certain Basidiomycete lineages display high numbers of biosynthetic genes often found in clusters, suggesting substantial metabolic potential.

K-24: Distance Measures for Tumor Evolutionary Trees
COSI: EvoCompGen COSI
  • Zach DiNardo, Carleton College, United States
  • Kiran Tomlinson, Carleton College, United States
  • Layla Oesper, Carleton College, United States
  • Anna Ritz, Reed College, United States

Short Abstract: There has been a recent increased interest in inferring the evolutionary tree underlying a tumor’s developmental history. Quantitative measures that compare such trees are vital to benchmarking these algorithmic tree inference methods, understanding the structure of the space of possible trees for a given dataset, and clustering together similar trees in order to evaluate common inheritance patterns. However, few appropriate distance measures exist, and those that do exist have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and mutation inheritance patterns. Here we present two novel distance measures, Common Ancestor Set distance (CASet) and Distinctly Inherited Set Comparison distance (DISC), that are designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to simulated data and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures.

K-25: Evolution of plant growth-promoting and plant-associated pathogenic bacteria in environmental samples
COSI: EvoCompGen COSI
  • Sascha Patz, University of Tübingen, Germany
  • Silke Ruppel, Leibniz Institute of Vegetable and Ornamental Crops (IGZ) e.V., Germany
  • Daniel H. Huson, Algorithms in Bioinformatics, Center for Bioinformatics, University of Tübingen, Germany
  • Caner Bagci, University of Tuebingen, Germany

Short Abstract: Outbreaks of food-borne diseases are amounted to 600 million per year by the WHO. Recent cases are linked to Escherichia coli O157:H7 and Salmonella found on lettuce and tahini. Ensuring food safety and early combat of microbial pathogens on plants is of great importance. However, detecting upcoming pathogens in plant metagenomes and differentiating from plant growth-promoting bacteria (PGPB), remains a challenge, not only due to shared mechanisms, such as for host colonization and inter-bacterial competition, which are considered as virulence factors (VFs). We have screened approximately 9,000 complete bacterial genomes for their plant growth-promoting traits (PGPTs) and VFs, using the following four steps: (1) gene mapping to partially novel ontologies comprising over 1,500,000 newly summarized PGPTs and around 37,500 known VFs, (2) clustering of genomes according to PGPT and VF occurrence patterns using a novel algorithm that uses frequency, genomic locations and variants, (3) generation of profiles indicating pathogenic and beneficial bacteria, and (4) validation using incomplete genomes of known pathogens or PGPBs. With this, we hope to provide a tool to help fight against food-borne pathogens. We intend to extend this work to metagenomic applications, for example, so as to find niches of pathogens and to identify appropriate antagonists.

K-26: An Offline Alignment-free Whole-Genome Distance Calculator for Large-scale Bacterial Analyses
COSI: EvoCompGen COSI
  • Gleb Goussarov, SCK-CEN, Belgium
  • Ilse Cleenwerck, BCCM/LMG Bacteria Collection, LM-UGent, Ghent University, Belgium
  • Mohamed Mysara, SCK-CEN, Belgium
  • Natalie Leys, SCK-CEN, Belgium
  • Aurélien Caralier, Ghent University, Belgium
  • Peter Vandamme, Ghent University, Belgium
  • Rob Van Houdt, SCK-CEN, Belgium

Short Abstract: In the last decade, improvements in sequencing technology and computational methods for comparing genomic sequences have vastly improved our understanding of bacteria. Average nucleotide identity has been the de-facto standard for comparing genomes during this time but shows its limits when challenged with metagenome assembled genomes. Indeed, at about 100 genomes, the time required to perform all possible pairwise comparisons between genomes becomes prohibitive, and screening large databases containing thousands of genomes to identify bacterial species is also not an option. On the other hand, alignment-free methods relying on oligonucleotide counting are a good candidate to perform this task, at the cost of some accuracy. Our software enables users to perform analyses on both small- and large-scale datasets, using a variety of methods, and producing a unified output. It also includes a novel approach which has shown excellent performance for bacterial identification and typing. Here we focus on the results obtained using this approach and illustrate the capabilities of our software on multiple datasets of varying complexity and size.

K-27: Graph-based network analysis of transcriptional regulation pattern divergence in duplicated yeast gene pairs
COSI: EvoCompGen COSI
  • Juris Viksna, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Gatis Melkus, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Peteris Rucevskis, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Edgars Celms, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Karlis Cerans, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Paulis Kikusts, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Lelde Lace, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Martins Opmanis, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Darta Rituma, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Karlis Freivalds, Institute of Mathematics and Computer Science, University of Latvia, Latvia

Short Abstract: The genome of Saccharomyces cerevisiae is among the most extensively studied eukaryotic genomes. A defining event in the evolutionary history of S. cerevisiae was a whole genome duplication (WGD) event approximately 100-200 Ma ago, giving rise to a special class of paralogous genes known as ohnologues. Here we investigate the possible implications of this difference in origin between yeast ohnologues and tandem-duplicated paralogues through the lens of network motif analysis. To achieve this, we generated a transcriptional regulatory network (TRN) from publicly available data and performed an exhaustive graph-based network motif analysis. The prevalence of both complete and partial bi-fan motifs within the context of feed-forward loops and other motifs proved to be an effective means of estimating functional divergence associated with ohnologue and paralogue pairs. We found good agreement between our network divergence measures and sequence similarity, and additionally detected some notable differences in the apparent network divergence patterns of ohnologue and paralogue pairs. Our findings demonstrate that genetic divergence between paired ohnologues as well as paralogues is accompanied by a corresponding divergence in TRN motifs, and that the study of bi-fan motifs is a useful network-based approach for investigating post-WGD ohnologue evolution.

K-28: Choosing amino-acid replacement models
COSI: EvoCompGen COSI
  • Lars Arvestad, Stockholm University, Sweden

Short Abstract: Selecting the most suitable sequence evolution model, using tools like ProtTest and IQ-TREE, is today a common step in phylogenetic tree inference. It has become established practice to use maximum likelihood as model selection principle, which is computationally demanding: every plausible model (and sub-model) is tested, leading to an unfortunate com- binatorial effect. Being able to quickly select an appropriate model, or at least reduce the set of models to test with maximum likelihood, would simplify experimentation. We propose a fast method for choosing models, based on the eigen decomposition of amino acid replacement rate matrices. The method works well on simulated data.

K-29: Transposon amplification as contributor to enhancer redundancy
COSI: EvoCompGen COSI
  • Nicolai Barth, Friedrich-Alexander Universität Erlangen-Nürnberg, Germany
  • Lifei Li, Friedrich-Alexander Universität Erlangen-Nürnberg, Germany
  • Leila Taher, Friedrich-Alexander Universität Erlangen-Nürnberg, Germany

Short Abstract: Many gene networks appear to contain partially redundant regulatory sequences. Distal regulatory sequences with similar activity patterns are commonly referred to as redundant or shadow enhancers. It is silently assumed that shadow enhancers mainly originate by means of duplication. However, it is also possible that shadow enhancers originate independently, for instance through independent transposon insertion and subsequent refunctionalization. Here, we investigated whether independent origin of shadow enhancers may be more widespread than generally assumed. We utilized the set of enhancers predicted by the FANTOM5 project. First, we assigned target genes to the enhancers based on the correlation of their activity patterns. Then, we grouped the enhancers according to their target genes and activity patterns to identify groups of shadow enhancers. Shadow enhancers show features that differ significantly from non-redundant enhancers. For example, shadow enhancers are less conserved. Approximately half of the enhancers overlap with transposons, pointing at the importance of transposon amplification in enhancer evolution. Moreover, when compared in a pairwise manner, shadow enhancers overlap with different types of transposons, and thus, are likely to have originated independently from one another. Our results provide evidence that independent origin is indeed a widespread mechanism of shadow enhancer evolution.

K-30: Accurate and Efficient Cell Lineage Tree Inference from Noisy Single Cell Data: the Maximum Likelihood Perfect Phylogeny Approach
COSI: EvoCompGen COSI
  • Yufeng Wu, Computer Science and Engineering Department, University of Connecticut, United States

Short Abstract: Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. There is significant noise in single cell genotypes called from sequence data. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. In this paper, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with uncertain genotypes in the form of individualized genotype probabilities (which can be computed by existing single cell genotype callers). This allows better utilization of the information about uncertain genotypes from single cell sequence data. ScisTree assumes the infinite sites model that leads to the well-known perfect phylogeny formulation. Given uncertain genotypes with individualized probabilities, ScisTree infers cell lineage tree and calls the genotypes that allow a perfect phylogeny and maximize the likelihood of the genotypes. ScisTree can also impute the so-called doublets from noisy data.

K-31: Machine learning based Phylogenetic profilling (MLPP) - using local co-evolution for functional interaction prediction and uncovering evolutionary insights
COSI: EvoCompGen COSI
  • Yuval Tabach, The Hebrew University of Jerusalem, Israel
  • Doron Stupp, Hebrew University of Jerusalem, Israel

Short Abstract: Phylogenetic profiling is an established method for predicting functional interactions. Recently, our lab and others have shown local co-evolution (clade-wise) improves the predictive power of phylogenetic profiling. Moreover, it is hypothesized that different types of pathways co-evolve in different manners, for example signaling pathways may “re-route” more often than metabolic pathways, leading to different patterns for a pathway type as a whole. Thus, we sought to develop a method for predicting functional interactions as we well as their interaction context by a machine learning based phylogenetic profiling approach. Trained on the Reactome database to predict functional interactions and top-level pathway, our approach achieves a drastically improved performance over classic phylogenetic profiling approaches. In addition, through model interpretability, the model recognizes informative clades where loss-events occurred. For example, the prediction for Krebs cycle identifies the loss of the pathway in the Microsporidia parasitic clade to be important, linking trait to gene to function. We believe our approach can be used to better predict functional interactions, as well as the novel prediction of interaction context. Moreover, evolutionary insights can be derived to improve our understanding of how pathways evolve.

K-32: Computing a Yeast tree of life
COSI: EvoCompGen COSI
  • Adam Elliston, Earlham Institute, United Kingdom
  • Ian Roberts, Quadram Institute Bioscience, United Kingdom
  • Ann-Marie Keane, Quadram Institute, United Kingdom
  • Jo Dicks, Quadram Institute, United Kingdom
  • Steve James, Quadram Institute, United Kingdom

Short Abstract: Phylogenetic analysis both informs our view of the divergence of species and develops a framework on which to view and exploit phenotypic information. The UK National Collection of Yeast Cultures (NCYC; http://www.ncyc.co.uk) consists of over 4,000 diverse strains, ideal for the construction of such a framework for yeast. Recent genome sequencing of ~1,000 NCYC strains has provided raw material for next-generation sequencing (NGS)-based tree estimation. Several NGS-based approaches to phylogenetic analysis have emerged in the past few years. One highly popular approach uses feature frequency profiles (Sims et al., 2009), essentially comparing frequency distributions of k-mers between genome pairs as a proxy for evolutionary divergence. This approach is simple to use but has shown some problems with computational efficiency and in taking biological features into account. Here, we present a comparison of multi-locus sequence, whole genome SNP, and NGS-based phylogenetic approaches, focussing on a large well-studied set of Saccharomyces complex strains. The success of the various approaches was assessed by computational measures (e.g. Robinson Foulds distances, Mantel tests). Simulation studies were also used to assess the accuracy of the different phylogenetic methods. The results will inform future work aiming to develop new NGS-based approaches that incorporate additional biological knowledge.

K-33: A unified nomenclature for vertebrate olfactory receptors
COSI: EvoCompGen COSI
  • Tsviya Olender, The Weizmann Institute of Science, Israel
  • Tamsin Jones, European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
  • Elspeth Bruford, European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
  • Doron Lancet, The Weizmann Institute of Science, Israel

Short Abstract: Olfactory receptor (ORs) are GPCRs with a crucial role in odor detection. A typical mammalian genome harbors ~1000 OR genes and pseudogenes; however, different gene duplication/deletion events have occurred in each species, resulting in complex orthology relationships. While for human a widely accepted nomenclature is available, based on phylogenetic classification into 18 families and further into subfamilies, for other mammals different nomenclature systems are used, concealing important evolutionary insights. We developed the Mutual Maximum Similarity (MMS) algorithm, a systematic classifier for assigning a human-centric nomenclature to any OR gene based on inter-species hierarchical pairwise similarities, and applied it to the OR repertoires of 7 mammals and zebrafish (10,247 ORs). The availability of a unified nomenclature provides a framework for evolutionary studies, where textual symbol comparison allows an immediate identification of potential orthologs and species-specific expansions/deletions, e.g. Or52e5 and Or52e5b represent a duplication of OR52E5 in rat. Another example is the absence of OR6Z subfamily among primate OR symbols. In other mammals, OR6Z members are disposed in one genomic cluster, suggesting a large deletion in the primate lineage. This unified nomenclature is applied by the Vertebrate Gene Nomenclature Committee and its implementation is under consideration by relevant species-specific nomenclature committees.

K-34: Kmer-db: instant evolutionary distance estimation
COSI: EvoCompGen COSI
  • Sebastian Deorowicz, Silesian University of Technology, Poland
  • Adam Gudys, Silesian University of Technology, Poland
  • Maciej Długosz, Silesian University of Technology, Poland
  • Marek Kokot, Silesian University of Technology, Poland
  • Agnieszka Danek, Silesian University of Technology, Poland

Short Abstract: Large volumes of data generated during the course of sequencing thousands of prokaryotic organisms (e.g., NCBI Pathogen Detection project) require fast analysis methods. Short substrings of nucleotide sequences, called k-mers, are commonly used in this area as they can be extracted from genomes or sequencing reads, allowing alignment-free approximation of average nucleotide identity (ANI). Therefore, k-mers are often used for phylogeny reconstruction, bacteria identification, or metagenomic classification. The existing solutions for k-mer-based evolutionary analyses (e.g., Mash) are slow and need a lot of memory. This imposes the usage of small subset of k-mers, known as sketches, and limits the applicability to closely related genomes which representation can be reduced without accuracy decrease. We present Kmer-db, a new tool for evolutionary reconstruction, which is free of this limitation. The estimation of similarities of 40 715 pathogen genomes on the basis of full 20-mer spectrum fitted in 40GB RAM and required 2h30, less than Mash for 500 times smaller representation (10000 sketch). When executed with the same number of k-mers as the competitor, Kmer-db processed the dataset below 7 minutes, 26 times faster than Mash. This confirms the readiness of Kmer-db for processing larger datasets which are to appear in the future.

K-35: EVOLUTIONARY MOTIFS- A NOVEL WAY TO DEFINE EVOLUTION ACROSS A THOUSAND OF SPECIES
COSI: EvoCompGen COSI
  • Hodaya Beer, Hebrew University of Jerusalem, Israel
  • Dana Sherill-Rofe, Hebrew University of Jerusalem, Israel
  • Doron Stupp, Hebrew University of Jerusalem, Israel
  • Yuval Tabach, Hebrew University of Jerusalem, Israel

Short Abstract: Genomic revolution enables the study of gene evolution and conservation patterns (i.e. presence or absence) in a set of genomes. Analyzing the co- evolution (phylogenetic profiling) of human genes shows extremely variable evolutionary patterns. For example, some genes show a relatively high conservation score, much higher than expected based on their phylogenetic distance. These may be compared to genes whose phylogenetic profile incline correlates with evolution. These patterns were never classified and currently gene evolution is simply defined in terms of conserved or not. In this work we identify several recurring phylogenetic patterns, called ‘Phylo-motifs’, throughout 1154 species. Phylo-motifs are annotations of evolutionary patterns which encompass insights about the genes role. We further determined mathematical criteria for each motif, enabling computational clustering methods to apply the motifs to all the human genes. We validated the criteria by ensuring that it captures expected gene profiles. We also found new evolutionary concepts by studying the overall enrichment of biological terms among Phylo- motif genes. Phylo-motifs are evolutionary key-words to define and explain the observed co- evolution profiles of genes. We hope that this work will suggest a new language for phylogenetic profiling and enhance our understanding of the evolution of genes.

K-36: Viruses adopt non-optimal codon usage
COSI: EvoCompGen COSI
  • Gon Carmi, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel
  • Alessandro Gorohovski, The Azrieli Faculty of Medicine, Bar-Ilan University, Ukraine
  • Rajesh Detroja, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel
  • Naamah Bloch, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel
  • Meir Shamay, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel
  • Dorith Raviv Shay, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel
  • Milana Frenkel-Morgenstern, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel

Short Abstract: Introduction: The genetic code has redundancies. Many viruses introduce their tRNAs inside the host cells. We proposed that viruses introduce their own tRNAs preferentially for non-optimal codons in order to infect multiple hosts. The rational is that viruses with codon usage matched a particular host reduce their ability to infect multiple hosts efficiently. Methods: For 115 viruses with tRNA genes and corresponding 55 hosts, the codon usage preferences were estimated by our unique non-optimality score. Moreover, the codon usage tables were calculated for each virus/host. The mathematical model has been proposed to explain the non-optimal codon usage preferences for viruses during the host cell-cycle. In particular, the latent and lytic states of viruses were compared to identify unique features. Results: We found a high correlation between the non-optimal codon usage preferences of viruses and a number of multiple hosts that they affect (the Person correlation coefficient of 0.7). In particular, latent genes of human herpes viruses adopt non- optimal codon usage, e.g., KHSV and EBV. Discussion: Our results indicate that for infection of multiple hosts, the non-optimal codon usage in viruses is unique evolutionary strategy of adoption. This strategy enables viruses to efficiently spread into different niches and hosts.

K-37: Accounting for calibration uncertainty: Bayesian molecular dating as a “doubly intractable” problem
COSI: EvoCompGen COSI
  • Stephane Guindon, CNRS, France

Short Abstract: This study introduces a new Bayesian technique for molecular dating that explicitly accommodates for uncertainty in the phylogenetic position of calibrated nodes derived from the analysis of fossil data. The proposed approach thus defines an adequate framework for incorporating expert knowledge and/or prior information about the way fossils were collected in the inference of node ages. Although it belongs to the class of “node-dating” approaches, this method shares interesting properties with “tip-dating” techniques. Yet, it alleviates some of the computational and modeling difficulties that hamper tip-dating approaches. The influence of fossil data on the probabilistic distribution of trees is the crux of the matter considered here. More specifically, among all the phylogenies that a tree model (e.g., the birth–death process) generates, only a fraction of them “agree” with the fossil data. Bayesian inference under the new model requires taking this fraction into account. However, evaluating this quantity is difficult in practice. A generic solution to this issue is presented here. The proposed approach relies on a recent statistical technique, the so-called exchange algorithm, dedicated to drawing samples from “doubly intractable” distributions.

K-38: Reconstructions of 250 ancestral genomes across the eukaryotic kingdom
COSI: EvoCompGen COSI
  • Matthieu Muffato, European Molecular Biology Laboratory - European Bioinformatics Institute, United Kingdom
  • Nga Thi Thuy Nguyen, Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, France
  • Alexandra Louis, Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, France
  • Camille Berthelot, Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université, France
  • Hugues Roest Crollius, Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université, France

Short Abstract: Large-scale mutational events, such as rearrangements and duplications, play major roles in disease and genome evolution. However, the history of these events remains significantly more challenging to study over long evolutionary times than sequence evolution. Computational methods have been proposed to reconstruct the genome organisation of ancestral species along a species phylogeny, with different trade-offs between accurate resolution of local gene organisation and comprehensive reconstructions of entire chromosomes and karyotypes. Here we present AGORA (Algorithm for Gene Order Reconstruction in Ancestors), a graph-based, recursive parsimony method that estimates gene content and order for every ancestor independently in a species tree. Briefly, AGORA extracts commonalities between pairs of extant genomes to infer inherited and putatively ancestral characteristics. AGORA is fully automated, parallelized and scales to phylogenies including over a hundred extant genomes. We compare the performances of AGORA to several curated ancestral genome reconstructions in vertebrates, plants and fungi and show that AGORA accurately reconstructs ancestral genomes at every scale from local gene-to-gene organization to full chromosomes and karyotypes. We have reconstructed over 250 ancestral genomes across eukaryotes using AGORA, which will be released as part of the Genomicus database.

K-39: Improving classification of novel genes into known gene families via the phylo-kmers
COSI: EvoCompGen COSI
  • Benjamin Linard, LIRMM, France
  • Vincent Ranwez, SupAgro Montpellier, France
  • Céline Scornavacca, ISEM, France
  • Fabio Pardi, LIRMM, France

Short Abstract: One of the most fundamental tasks in genome annotation is to classify new genes into known gene families. It is generally addressed by pairwise alignments to establish similarity scores (Blast) or profile HMM alignments. Nowadays, the latter is standard because of its scalability but it still shows limitations: i) when species sampling is biased, HMMs are less representative of the most isolated clades and ii) HMMs do not account for evolutionary distances that may be available if phylogenies were built for each gene family. Recently, we developed the concept of “phylogenetically-informed kmers” (phylo-kmers) to provide an efficient solution to the problem of alignment-free phylogenetic placement. In the algorithm CLAPPAS, we adapted the phylo-kmer idea to the problem of protein classification into gene families. It relies on a first phase where sets of phylo-kmers are build and indexed for each gene family and a second phase where matches between k-mers from a query gene and pre-computed phylo-kmers are used to assign the gene to its most likely family of origin. Our preliminary results show that in this classification phase, CLAPPAS is already several orders of magnitude faster than HMM-based classification while keeping comparable accuracy.

K-40: A Pan-Cancer Survey of Cell Line/Tumor Similarity by Genomic Profiles using TumorComparer
COSI: EvoCompGen COSI
  • Rileen Sinha, Memorial Sloan Kettering Cancer Center, United States
  • Nikolaus Schultz, Memorial Sloan Kettering Cancer Center, United States
  • Augustin Luna, Department of Cell Biology, Harvard Medical School, Boston, MA, USA, United States
  • Chris Sander, Department of Cell Biology, Harvard Medical School, Boston, MA, USA, United States

Short Abstract: Cell lines derived from human tumors are often used in pre-clinical cancer research, but some cell lines may be too different from tumors to be good models. Genomic and molecular profiles can be used to guide the choice of cell line suitable for particular investigations, but not all features may be equally relevant. We present TumorComparer, a computational method, and web service for comparing cell lines and tumors with the flexibility to place a higher weight on functional alterations of interest. In a first pan-cancer application, we compare 536 cell lines and 8249 tumors of 25 cancer types, using weights emphasizing recurrent genomic alterations. We characterize the similarity of cell lines and tumors within and across cancers, identifying outlier and mislabelled cell lines as well as good matches, and identify cancers with an unusually high number of good or poor representative cell lines. Using multiple data types allows us to identify cell lines which show high similarity to tumors using one data type, but lack important alterations according to other data types. The weighted similarity method in the future may be useful to assess genomic-molecular patient profiles for the personalized choice of clinical trials or therapy.

K-41: An updated view of the oligosaccharyltransferase complex in Plasmodium and other protists
COSI: EvoCompGen COSI
  • Stella Tamana, Molecular Genetics Thalassaemia, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus, Cyprus
  • Vasilis Promponas, Department of Biological Sciences, University of Cyprus, Cyprus

Short Abstract: Asparagine-linked glycosylation is among the fundamental processes conserved throughout all domains of life and the oligosaccharyltransferase (OST) complex is a key enzyme in this process. The eukaryotic OST complex consists of 8 core subunits, integral to the endoplasmic reticulum membrane, with homologs identified across all eukaryotic phyla. Even though several genomes of malaria parasites have been sequenced and intensively annotated, the currently established notion is that only 4 subunits of the OST complex are present in Plasmodium. In this study, we provide unequivocal evidence that all components of the OST complex (with the exception of Swp1/Ribophorin II) are encoded in Plasmodium. Our results are further corroborated by EST/RNA-seq data, indicating that the newly annotated genes are indeed expressed. Importantly, we identify as the main reason why the unusually short Ost4 subunit had not been characterized in Plasmodium so far is related to the bias of gene-prediction pipelines against detecting short coding sequences. We additionally identify ‘missing’ OST subunits from other parasitic protists, highlighting the need for a more thorough examination of the composition of the OST complex, suggesting new directions in both biomedical applications and towards elucidating the deep phylogeny of this fundamental post-translational modification in eukaryotes.

K-42: Orthology Benchmarking in the Quest for Orthologs Community
COSI: EvoCompGen COSI
  • Adrian Altenhoff, ETH Zurich, Switzerland
  • Erik Sonnhammer, Stockholm University, Sweden
  • Christophe Dessimoz, University of Lausanne, Switzerland
  • Salvador Capella-Gutiérrez, Barcelona Supercomputing Center (BSC), Spain

Short Abstract: Over the years several community driven benchmarking initiatives for computational bioinformatics pipelines (e.g. CASP, CAFA) have become essential tools to compare algorithms that aim to solve the same problem. The Quest For Orthologs benchmark service is an automated, web-based community effort to assess the quality of predicted orthologs using a whole battery of benchmarks. Since publication of the service (Altenhoff et al, Nat Meth 2016), it has gained a substantial user base. Here, we present recent improvements to the service in terms of better QfO reference proteome datasets and novel benchmarks based on domain conservation. Also, we present our achievements on porting the service onto ELIXIR’s OpenEBench platform that better enables scaling up to more users and bigger datasets, and lowers barriers to add additional benchmarks. Using OpenEBench will contribute to the sustainability of the QfO benchmarking services and can be considered as an example on how existing communities can interact with major infrastructures like ELIXIR.

K-43: Analysis of recurrent mutations within cancers reveals widespread patterns of convergent evolution
COSI: EvoCompGen COSI
  • Asli Kucukosmanoglu, Amsterdam UMC-VUMC, Netherlands
  • Carolien L van der Borden, Amsterdam UMC-VUMC, Netherlands
  • Bart Westerman, Amsterdam UMC - VUmc, Netherlands

Short Abstract: Intratumor genetic heterogeneity is commonly caused by a stochastic evolution, but can also show remarkable selectivity resulting in a converging evolution. In converging evolution two or more independent mutations occur in the same gene. We hypothesized that converging evolution is driven by its genetic onset-stage. This process could enforce already existing treats resulting in an enhancement of a linear pathway rather than the complementary/parallel evolution. We generated a prediction model for convergent mutations based on its genetic onset-stage. We performed a comprehensive analysis of 16 different tumor types of ~10,000 patients of whole-exome sequencing and copy number variation, obtained from The Cancer Genome Atlas. We found in 5% of the patients that converging evolution occurs on a fixed genetic onset stage. There is a strong relationship between the frequencies of converging mutations and the frequencies of commonly co-occurring mutations in the onset stage. In addition, these co-occurring mutations are frequently observed in the same chromosomal region and/or in the same pathway. Given these strong correlations and since converging evolution is apparently a highly selective process, our prediction model could predict more effective therapies by choosing therapies targeting both the convergent gene as well as the commonly occurring co-mutated genes.

K-44: Spliced alignment for the reconstruction of gene and transcript evolution
COSI: EvoCompGen COSI
  • Aida Ouangraoua, Université de Sherbrooke, Canada
  • Safa Jammali, Université de Sherbrooke, Canada
  • Esaie Kuitche Kamela, Université de Sherbrooke, Canada

Short Abstract: Alternative splicing is a powerful mechanism that allows the production of multiple splice transcript variants by genes in eukaryotic organisms. However, current comparative genomics and phylogenetic reconstruction methods make use of a single reference transcript per gene to reconstruct gene families evolution and infer gene orthology relationships. Moreover, most of these methods only relies on sequence similarity/divergence, while neglecting the splicing structure of transcripts that is also informative. To address these lacks, we have developed a series of algorithms for computing multiple spliced alignments, inferring splicing orthology relationships, and constructing transcript and gene trees. The new methods account for multiple alternative transcripts and both sequence and splicing structure similarity between transcripts. We have also developed a method for the visualization and annotation of the splice variants of a set of homologous genes, based on multiple spliced alignment.

K-45: The rare large structured ncRNA GOLLD in Mycobacterium
COSI: EvoCompGen COSI
  • Sergio Morgado, FIOCRUZ, Brazil
  • Deborah Antunes, FIOCRUZ, Brazil
  • Ernesto Caffarena, FIOCRUZ, Brazil
  • Ana Carolina Vicente, FIOCRUZ, Brazil

Short Abstract: Noncoding RNAs produce transcripts involved in catalytic or regulatory functions, some of them presenting highly complex structures. Among the largest bacterial ncRNAs, GOLLD RNA is supposed to occur in specialized groups of bacteria. GOLLD RNA have already been identified in bacteria from Lactobacillales and Actinomycetales orders, often surrounded by tRNA genes. Here mined golld in 7670 Mycobacterium and 2910 mycobacteriophage genomes with Infernal software using GOLLD covariance model from Rfam database. We identified golld in 351 mycobacteria and 18 mycobacteriophage genomes, mainly associated with tRNA arrays. Besides, two mycobacteria plasmids (264-274kb) presented golld. GOLLD sequences comparison revealed that 33% of the sites were conserved. These sequences were grouped in three clades: a Mycobacterium exclusive, other comprising Mycobacterium and mycobacteriophages, and another with mycobacteriophage golld sequences. Due to this golld sequence diversity, we determined the secondary structure of each clade using R2R software based on golld alignments generated by Infernal software. The 3' golld sequence region from the three clades presented a complex structure consisting of multisteam junctions and pseudoknots resembling the canonical GOLLD structure. Our study revealed that the large structured ncRNA GOLLD is spread within Mycobacterium in association with tRNA arrays and occasionally with mobile elements.

K-46: Distinguishing successive ancient polyploidy levels based on genome-internal syntenic alignments
COSI: EvoCompGen COSI
  • Yue Zhang, University of Ottawa, Canada
  • Chunfang Zheng, University of Ottawa, Canada
  • David Sankoff, University of Ottawa, Canada

Short Abstract: A basic tool for studying the polyploidization history of a genome, especially in plants, is the distribution of duplicate gene similarities in syntenically aligned regions of a genome. Often there are two or more peaks, each representing a different polyploidization event. These distributions may be generated by means of a discrete time, non-homogeneous branching process, followed by a standard sequence divergence model. While the similarities data allows for inference of fractionation rates and other parameters they usually cannot pin down the ploidy level of each event. For a sequence of two events of unknown ploidy, either tetraploid or hexaploid, we base our analysis on high-similarity triples of genes -- triangles. We calculate the probability of the four triangle types with origins in one or the other event, and impose a mutational model so that the distribution resembles the original data. Using a ML transition point in the similarities between the two events as an discriminator for the hypothesized origin of each similarity, we calculate the predicted number of triangles of each hypothesized type for each mode combining hexaploidization and/or tetraploidization. This yields a profile of triangle type for each model, which can then be used to assess real genomic data.

K-47: GEMME: a simple and fast global epistatic model predicting mutational effects
COSI: EvoCompGen COSI
  • Elodie Laine, Sorbonne Université - Laboratory of Computational and Quantitative Biology (LCQB, CNRS-SU), France
  • Alessandra Carbone, Sorbonne Université, France

Short Abstract: Natural protein sequences observed today are the result of evolutionary processes selecting for function. They can inform us on which and how sequence variations affect proteins’ biological functions, a central question in biology, bioengineering and medicine. The increasing wealth of genomic data has enabled the accurate prediction of complete mutational landscapes. State-of-the-art methods adressing this problem explicitly or implicitly model inter-dependencies between all positions in the sequence of interest to predict the effect of a particular mutation at a particular position. They infer hundreds of thousands of parameters from very large multiple sequence alignments. They require large variability in the input data and remain time consuming. Here, we present GEMME (www.lcqb.upmc.fr/GEMME), a fast, scalable and simple method to predict mutational outcomes by considering the evolutionary history that relate natural sequences. GEMME infers evolutionary relationships between sequences by quantifying their global similarities. It then uses these relationships, encoded in a tree, to estimate conservation levels and evolution fits required to accommodate mutations. Assessed against 41 experimental high-throughput mutational scans, GEMME overall performs similarly or better than existing methods and runs faster by several orders of magnitude. It greatly improves predictions for viral sequences and, more generally, for very conserved families.

K-48: Fast and scalable placement of protein sequences into hierarchical orthologous groups
COSI: EvoCompGen COSI
  • Victor Rossier, University of Lausanne, Switzerland
  • Christophe Dessimoz, University of Lausanne, Switzerland
  • Marc Robinson-Rechavi, Universite de Lausanne, Switzerland

Short Abstract: Hierarchical orthologous groups (HOGs) provide a precise definition for the intuitive notion of gene families and subfamilies. Defined as sets of genes that originated from a common ancestral gene within a given clade, HOGs collectively capture all orthology and paralogy relations within their hierarchical structure. However, current HOG inference methods rely on the computation of countless sequence alignments, which limits orthology knowledge to a few hundreds genomes. Here, we present an accurate and scalable method to classify genes into reference HOGs, which leverages an alignment-free approach based on k-mer indexing. One key innovation lies in the ability of our approach to identify one-to-many orthology relations that are frequently falsely predicted as one-to-one orthology when merely relying on the most similar reference sequence. For example, an hemoglobin gene that diverged before the duplication of the alpha and beta hemoglobin subfamilies shall be identified as orthologous to both subfamilies. Our method achieved above 90% classification accuracy when simulating an evolutionary distance from mammals to birds and reptiles. Hence, we believe this method will pave the way for large-scale comparative genomic analyses because it optimizes the trade-off between speed and sensitivity of orthology inference in light of the unrelenting growth of genomic data.

K-49: Pairtree: fast cancer phylogeny reconstruction using multiple samples
COSI: EvoCompGen COSI
  • Jeff Wintersinger, University of Toronto, Dept. of Computer Science, Canada
  • Stephanie Dobson, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada, Canada
  • John Dick, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada, Canada
  • Quaid Morris, University of Toronto, Canada

Short Abstract: Tumours are not homogeneous masses, but are instead composed of multiple genetically distinct subpopulations of cells. These genetic differences can affect treatment response. Using genomic sequencing data taken from mixtures of these subpopulations, we can infer which mutations each subpopulation possesses, and the evolutionary relationships between subpopulations. Here we present Pairtree, a novel algorithm for profiling cancerous subpopulations in a patient's tumour. Pairtree can exploit multiple tissue samples taken from a patient, either from different spatial points in the tumour or at different temporal points through treatment. We can, for instance, characterize which evolutionary lineage gave rise to a metastasis or disease relapse, revealing how each subpopulation responded to treatment. This in turn can inform treatment targeted at this lineage. Each additional tissue sample from a patient improves Pairtree's ability to resolve subclonal populations, and to characterize the evolutionary relationships between these populations. However, each additional sample also increases the complexity of the computational problem. Pairtree explicitly models relationships between mutations, allowing its accuracy and resolution to improve with each additional sample. Alternative algorithms, by contrast, cannot deal with this computational complexity, and so exhibit progressively worse accuracy and resolution as the data become richer.

K-50: EVcouplings.org: web service for analysis of protein coevolution and prediction of 3D structures and mutation effects
COSI: EvoCompGen COSI
  • Christian Dallago, (TUM) Technical University of Munich, Germany
  • Thomas Hopf, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Germany
  • Anna Green, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, United States
  • Benjamin Schubert, Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany, Germany
  • John Ingraham, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, United States
  • Kelly Brock, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, United States
  • Andrew Diamantoukos, cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA, United States
  • Roc Reguant, (TUM) Technical University of Munich, Denmark
  • Augustin Luna, Department of Cell Biology, Harvard Medical School, Boston, MA, USA, United States
  • Nicholas Gauthier, Department of Cell Biology, Harvard Medical School, Boston, MA, USA, United States
  • Debora Marks, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, United States
  • Chris Sander, Department of Cell Biology, Harvard Medical School, Boston, MA, USA, United States

Short Abstract: Evolutionary Coupling (EC) analysis has greatly advanced in-silico prediction of macromolecular structure, complex formation and mutation effects starting in 2010. The EC method (also known as Direct Coupling Analysis (DCA)) analyses co-variation patterns in a sequence family, builds a global maximum entropy probability model for members of the family and infers residue-residue interaction constraints. The backend prediction pipeline uses the EVcouplings Python package, which is ideal for bioinformatics experts. For general biologists, the EVcouplings.org web server simplifies the use of EVcouplings tools and facilitates interpretation of results. Starting from a protein sequence, users are able to generate multiple sequence alignments (MSAs, viewed via alignmentviewer.org) resulting from database searches, calculate ECs, view interactive contact maps, predict the effect of amino acid mutations, and visualize predicted 3D structures of proteins. Users can also submit protein pairs for for inter-protein co-evolution analysis. The resulting MSAs, predicted contact maps and 3D model coordinates can be downloaded in standardized formats for further analysis with third-party applications. Pre-computed jobs on hundreds of protein families, some with known 3D structures for comparison, are available as a resource. The EVcouplings.org submission and analysis server is a one-stop venue for the biological research community.

K-51: DGINN, an automated pipeline for the detection of genetic innovations on genes engaged in evolutionary arms races
COSI: EvoCompGen COSI
  • Lea Picard, Centre International de Recherche en Infectiologie (CIRI), Laboratoire de Biométrie et Biologie Evolutive (LBBE), France
  • Andrea Cimarelli, Centre International de Recherche en Infectiologie (CIRI), France
  • Lucie Etienne, Centre International de Recherche en Infectiologie (CIRI), France
  • Laurent Gueguen, Laboratoire de Biométrie et Biologie Evolutive (LBBE), France

Short Abstract: The identification of cellular proteins that interfere with virus replication is a key challenge in virology. Amongst them, finding those engaged in long-term virus-host interaction and co-evolution is of particular interest. In the host, such selective pressures induce diverse genetic innovations, such as site-specific positive selection, gene copy number variation, recombination, etc. Under the hypothesis that genetic innovations in innate immunity may particularly occur in viral interacting proteins, we developed a pipeline for retrieving orthologous sequences, aligning them and reconstructing their phylogeny, followed by the detection of genetic innovations. This streamlined procedure uniquely allows for the detection of paralogous genes, recombination breakpoints, and signatures of positive selection with several widely-used methods. We validated this evolutionary and predictive pipeline on genes with known selection profiles. Furthermore, after screening 76 genes upregulated in macrophages resistant to HIV infection, we identified thirteen genes potentially encoding for novel viral interacting proteins, which we are now functionally characterizing in the lab. Overall, we designed a complete and highly-flexible pipeline, available directly on GitHub or in the form of a docker, that can screen large datasets, such as the genes from the Interferome database or any dataset of interest to the user.

K-52: Protein Sequence Space: Where Nature Rises above Randomness
COSI: EvoCompGen COSI
  • Andrei Lupas, Max Planck Institute for Developmental Biology, Germany
  • Laura Weidmann, Max Planck Institute for Developmental Biology, Germany
  • Tjeerd Dijkstra, Max Planck Institute for Developmental Biology, Germany
  • Oliver Kohlbacher, Max Planck Institute for Developmental Biology, Germany

Short Abstract: Features of natural protein sequences that distinguish them from random ones are of great interest to better understand the evolution of proteins. Exhaustive methods for detecting such differences have reached their limits at sequence lengths above 5, given that the exponentially growing space quickly becomes only sparely populated by the limited amount of existing data. In this study, we present an approach that overcomes this great complexity of sequence space to analyze domain-sized natural sequences. We contrasted a large amount of natural data, comprising 1307 bacterial genomes, with a variety of commonly used random models using a distance-based approach. With this, we found that the distribution of natural sequences over sequence space can be best described by the amino acid composition of domains. These results suggest that convergent features shape the global natural sequence space much stronger than evolutionary descent, which we find to have a mostly local influence.

K-53: Evolution of the Metazoan Protein Domain Toolkit.
COSI: EvoCompGen COSI
  • Maureen Stolzer, Carnegie Mellon University, United States
  • Yuting Xiao, Carnegie Mellon University, United States
  • Daniel Durand, Carnegie Mellon University, United States

Short Abstract: Domains, sequence fragments that encode structural or functional protein modules, are the basic building blocks of proteins. Thus, the set of all domains encoded in a genome is the protein function toolkit of the species. Domain family gain, expansion, and loss drive the evolution of this toolkit. New protein functions can arise via gain of new domains or novel combinations of existing domains, while specialization and streamlining are effected by domain loss. Here, we investigate how changes in domain content are linked to genome and organismal evolution in metazoa, using a phylogenetic birth-death-gain model. Our results show that the relative importance of gain, expansion, and loss varies across lineages, according to a small number of evolutionary strategies. Our results also reveal characteristic evolutionary patterns among domain families. We observe that sets of domain families are evolving in concert, sharing a similar history of events, representation in ancestral genomes, and/or inferred event rates. In many cases, they also share a functional role, linking protein family evolution to innovations in the immune and nervous systems. In summary, the use of a powerful probabilistic birth-death-gain model reveals organizing principles of protein evolution in metazoan genomes.

K-54: ProPhyC: A probabilistic, phylogeny-based approach for hierarchical classification of genomes
COSI: EvoCompGen COSI
  • Yun Zhang, J. Craig Venter Institute, United States
  • Sam Zaremba, Northrop Grumman Health Solutions, United States
  • Christian Zmasek, J. Craig Venter Institute, United States
  • Richard H. Scheuermann, J. Craig Venter Institute, United States

Short Abstract: Hepatitis C Virus (HCV) is a global health burden, with approximately 71 million people chronically infected globally. The virus is classified into genotypes, with each genotype further divided into subtypes, resulting in seven genotypes and 86 subtypes according to the International Committee on Taxonomy of Viruses (ICTV). HCV genotypes/subtypes are found to be important factors impacting patients’ response to various treatments. Therefore, exact genotype/subtype determination is a key factor in selecting treatment regimen options for patients. The large number of different genotypes/subtypes and high similarity between subtypes pose challenges for HCV genotype/subtype assignments. To support HCV typing, we have developed a phylogeny-based approach for type annotation. Besides probabilistically assigning a query genome to one (or more) known genotype/subtype(s), our method also distinguishes between genomes that likely represent novel genotypes/subtypes and those for which assignment is not possible due to insufficient phylogenetic signal. We demonstrate the accuracy and scalability of this tool via the annotation of over 200,000 HCV genomes in the Virus Pathogen Resource (ViPR; www.viprbrc.org) and show that our HCV typing tool can classify over 27,000 HCV genomes that lack type assignments in the GenBank records.

K-55: Growth patterns and driver effects from individual samples provide insights in tumor evolution
COSI: EvoCompGen COSI
  • Leonidas Salichos, Yale University, United States
  • William Meyerson, Yale University, United States
  • Jonathan Warrell, Yale University, United States
  • Mark Gerstein, Yale University, United States

Short Abstract: Evolving tumors accumulate thousands of mutations. Data explosion and whole genome sequencing have led to many methods for detecting cancer drivers which, however, underperform when recurrence is low. Our approach involves harnessing the VAF of mutations in the population of tumor cells in an ultra-deep sequenced single biopsy. We have developed a method that quantifies tumor growth and driver effects for individual samples based solely on the VAF spectrum. Drivers introduce a perturbation into this spectrum, and our method measures that perturbation. To validate our method, we used simulation models to successfully approximate the timing and size of a driver’s effect. Then, we tested our method on 993 linear tumors from the PCAWG Consortium and found that the identified periods of positive growth are associated with known drivers. Finally, we applied our method to an ultra-deep sequenced AML tumor and identified known cancer genes and additional driver candidates. In general, our results shed light to the dynamics of tumor progression indicating multicellular processes as significantly affected. Moreover, different mutation types appear to have adverse effects on tumor growth. Our method presents opportunities for personalized diagnosis via modeling of tumor progression using deep sequenced whole genome data from an individual.

K-56: Harmonising genome annotation through orthology
COSI: EvoCompGen COSI
  • Maarten Reijnders, University of Lausanne, Department of Ecology and Evolution, Switzerland
  • Livio Ruzzante, University of Lausanne, Department of Ecology and Evolution, Switzerland
  • Romain Feron, University of Lausanne, Department of Ecology and Evolution, Switzerland
  • Robert Waterhouse, University of Lausanne, Department of Ecology and Evolution, Switzerland

Short Abstract: Comparative genomics aims to explore commonalities and differences among genes and genomes from the diversity of life on Earth for an enhanced understanding of gene function and evolution, as well as of molecular and organismal biology. However, sequencing technologies and assembly approaches, as well as annotation methods and the amounts and quality of supporting data, also ‘evolve’ and vary greatly. This presents a challenge in large-scale comparative genomics to be able to distinguish between technical artefacts and true patterns of gene sequence, structure, or copy-number evolution. Here we present the development of computational approaches that leverage gene orthology data to harmonise genome annotations and thereby improve the accuracy of gene-based comparative studies. Using profile hidden Markov models to guide gene predictions, orthologues can be recovered from assemblies where they had previously been overlooked or incompletely annotated. We apply these to the genome assemblies of multiple mosquito species that have been sequenced and annotated over the last two decades. This case study highlights the utility of taking advantage of inter-species genomic comparisons to improve genome annotations for more robust evolutionary inferences.

K-57: DAIO: Domain-architecture aware inference of orthologs for the classification of Herpesviridae proteins
COSI: EvoCompGen COSI
  • Christian Zmasek, J. Craig Venter Institute, United States
  • David Knipe, Harvard University, United States
  • Philip Pellett, Wayne State University School of Medicine, United States
  • Richard H. Scheuermann, J. Craig Venter Institute, United States

Short Abstract: Herpesviridae are a large and diverse family of dsDNA viruses which have been implicated in animal and human diseases. Mammalian Herpesviridae have been divided into three subfamilies—Alphaherpesvirinae, Betaherpesvirinae, and Gammaherpesvirinae. In contrast to most other viruses, Herpesviridae have a long evolutionary history. Here we present a systematic phylogenetic and protein domain architecture-based study, encompassing the entire proteomes of all human Herpesviridae, as well as of select non-human herpesviruses. Besides assessing the taxonomic distribution for each herpesvirus protein, we computationally inferred gene duplication events and performed a comparative protein domain architecture analysis. The results indicate that while many herpesvirus proteins evolved without any detectable gene duplication or domain rearrangement event, numerous herpesvirus protein families do exhibit relatively complex evolutionary histories. Some of them acquired additional domains during evolution (e.g. DNA polymerase), whereas others show a combination of domain rearrangements and gene duplications (e.g. US22 domain proteins). We used the results of our analysis to develop a novel classification system for Herpesviridae proteins by clustering proteins into groups of orthologous proteins with shared domain architecture, defined as Strict Ortholog Groups (SOGs). This novel classification system of SOGs for human Herpesviridae proteins is available through the Virus Pathogen Resource (ViPR, www.viprbrc.org).

K-58: Computational approach to compare the evolutionary and cancer-associated breakpoint hotspots on the human genome
COSI: EvoCompGen COSI
  • Golrokh Kiani, Université de Québec à Montréal, Canada
  • Mohamed Amine Remita, Université de Québec à Montréal, Canada
  • Abdoulaye Baniré Diallo, University of Quebec in Montreal (UQAM), Canada

Short Abstract: Genome rearrangements partake in biological mechanisms such as evolution and cancer development. These events are not randomly distributed across the genome. Although, some overlaps between cancer and evolutionary rearrangements are previously reported, but no systematic method has been yet proposed to evaluate the correlation between the hotspots of cancer and evolutionary breakpoints. Hence, we decided to identify the evolutionary breakpoint hotspots and the distribution of cancer breakpoint hotspots with their regards. Using a multi-way synteny detection approach, we identified ~260 thousands evolutionary breakpoints. 8,708,761 cancer breakpoints were also extracted from Genomic Data Commons. We assessed the genomic regions for breakpoint enrichment based on a permutation test. Genomic regions were categorized into three groups: Evolutionary breakpoint Hotspot Regions (EBHRs), Cancer Breakpoint Hotspot Regions (CBHRs) and other as Breakpoint Refractory Regions. Comparison of hotspots showed a significant partial overlap between EBHRs and CBHRs. These regions show also different affinities to different chromosomes. Moreover, some hotspots located contiguously in long genomic regions up to 9.2 Mbp. Study of different functional markers showed that each of the above category of regions harbor different functional signatures. We are exploring each chromosome as well as those long contiguous regions to better understand such concentration.

K-59: An integrative computational evolutionary approach to accelerate the discovery of molecular targets in prevalent under-studied pathogens
COSI: EvoCompGen COSI
  • Janani Ravi, Pathobiology and Diagnostic Investigation, Michigan State University, United States
  • Lauren M Sosinski, Pathobiology and Diagnostic Investigation, Michigan State University, United States
  • Philip A Calhoun, Pathobiology and Diagnostic Investigation, Michigan State University, United States
  • Samuel Z Chen, Pathobiology and Diagnostic Investigation, Michigan State University, United States

Short Abstract: Nontuberculous mycobacteria (NTM) are environmental opportunistic pathogens, infecting humans and animals alike resulting in chronic disease (and the loss of millions of dollars). Yet, we lack actionable molecular targets to diagnose, prevent and treat NTM infections. Evolutionary relationships can give us vital clues about potential targets, which can be further refined using structural-functional information. However, the underlying data are diverse and reside in disconnected web-resources, requiring the arduous task of collation. Here, we develop a systematic computational workflow to standardize, pre-compute and integrate genomic resources for understudied, pathogenic NTM that enables the rapid identification of molecular diagnostic targets. Key components of this computational work include a) cataloging and characterizing potential diagnostic targets in well-studied pathogens, and b) developing an evolutionary approach for identification, structural/functional/genomic characterization and comparative pathogenomics of candidate biomarkers. Our preliminary results have helped us identify and functionally characterize virulence factors unique to pathogenic strains of NTM that may serve as candidates. Our computational results will help rapid prioritization of molecular targets for experimental/clinical validation. This comprehensive and generalizable computational approach can be easily extended to the discovery of drug/vaccine targets, and beyond NTM to any other pathogen affecting human, animal or plant health.

K-60: Comparative landscape genomics of foundation woodland tree species
COSI: EvoCompGen COSI
  • Kevin Murray, ANU, Australia

Short Abstract: Spatial genetic structure depicts the outcome of demographic history and selection, and can itself limit adaptation. Eucalyptus consists of foundation tree species that provide essential habitat and modulate ecosystem services throughout Australia, and are planted for fibre and fuel worldwide. Using whole-genome sequencing of several hundred individuals of three Eucalypt species, we found incredibly high genetic diversity within each species. When controlling for continuous isolation by distance (IBD), we found no support for, discrete population structure within either species. Using generalised dissimilarity modeling, we identified additional isolation by environment (IBE) driven by availability and demand for moisture, and soil nutrition. Redundancy analysis-based tests of genotype-environment interaction identified loci associated with environment. These results not only highlight the vast adaptive potential of these species, but identify key differences in the drivers of genetic structure at the landscape scale.

K-61: Genome-wide analysis of mutational impact on post translational modification
COSI: EvoCompGen COSI
  • Yiwei Ling, Niigata University Graduate School of Medical and Dental Sciences, Japan
  • Hisayoshi Yoshizaki, Department of Pediatric Surgery, Kanazawa Medical University, Japan
  • Shujiro Okuda, Niigata University Graduate School of Medical and Dental Sciences, Japan

Short Abstract: Post translational modification plays a variety of physiological roles. Recent technology has made it possible to identify a large number of phosphorylation sites at the same time. We clustered them with the features of amino acids and organized them into about 200 phosphorylation motifs. Comparative evolutionary analysis of these phospho-motifs successfully characterized the physiological importance. Then, we investigated associations between cancer-specific mutations and phosphorylation motifs. Of more than 16 million cancer specific mutations registered in a public database, about 100,000 mutations were found on the phosphorylation motif sequences with amino acid substitutions. We investigated relationships between these mutations and the evolutionary conservation of each phospho-motif. As a result, it was revealed that the frequency of cancer specific mutagenesis on the phospho-motifs had a positive correlation with the evolutionary conservation of the phospho-motifs. Conversely, this correlation was not confirmed by SNP data in healthy cohort. Our results suggest that mutations on the phospho-motifs affect not only the destruction of the motif structure but also the signal disruption due to the appearance of a new motif by amino acid substitution.

K-62: Transcriptional rewiring via promoter sequences in Salmonella
COSI: EvoCompGen COSI
  • Wim Cuypers, University of Antwerp, Belgium
  • Sandra Van Puyvelde, Institute of Tropical Medicine Antwerp, Belgium
  • Stijn Deborggraeve, Institute of Tropical Medicine Antwerp, Belgium
  • Kris Laukens, University of Antwerp, Belgium
  • Pieter Meysman, University of Antwerp, Belgium

Short Abstract: Salmonella bacteria efficiëntly adapt to new niches through the processes of gene acquisition, pseudogenisation and gene deletion. Knowing the precise mechanisms of how these pathogens adapt, is vital in the combat against highly resistant bacteria. In this work, we explore the hypothesis that transcriptional rewiring via promoter mutations could contribute to niche adaptation. We studied co-expression patterns of orthologous gene pairs of two very similar Salmonella strains (Salmonella Typhimurium LT2 and 14028s). Correlation matrices were constructed from normalised gene expression data included in the COLOMBOS compendium, and used to estimate the level of expression conservation. Promoter sequence divergence was analysed by pairwise aligning promoters sequences and assessing the differences per position. Multiple co-expression patterns were not conserved between two Salmonella strains, despite sharing 98% genome identity. The promoter regions displayed major pairwise differences, but transcription start sites and RNA polymerase binding sites were mainly conserved. We found evidence of transcriptional rewiring in two very similar Salmonella strains, indicating that features of the non-coding genome of Salmonella could contain important information about the process of niche adaptation. Ongoing work will clarify how promoter sequence divergence is linked to divergence in expression conservation.

K-63: Inferring Credible Horizontal Gene Transfers
COSI: EvoCompGen COSI
  • Agnieszka Mykowiecka, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland
  • Anna Muszewska, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Poland
  • Pawel Gorecki, University of Warsaw, Poland

Short Abstract: The phenomenon of horizontal gene transfer is one of the crucial factors influencing microbial evolution. Among other functions, it plays an important role in microbial ability to react quickly to environmental changes. It is also involved in the transmission of virulence genes and is the primary mechanism for the spread of antibiotic resistance. Inference of HGT events can be done by using the tree reconciliation, in which any incongruence between the topology of gene and species trees is explained as a biologically consistent scenario having a minimal number of gene duplication, losses, and HGTs. However, it is not always clear how to infer credible events. Here we present a new efficient dynamic programming approach to find evolutionary scenarios in acyclic graphs representing species evolution with HGT and a new measure, based on non-parametric bootstrap, called transfer support, to verify the credibility of inferred transfers. Furthermore, we propose a novel iterative method for the inference of well-supported and time-consistent horizontal gene transfers given a multiple sequence alignment, and a species tree. Finally, we provide empirical examples showing that the method can be used to support known transfer hypotheses from the literature.

K-64: COVTree: Coevolution in overlapped sequences by tree analysis server
COSI: EvoCompGen COSI
  • Elin Teppa, Sorbonne Université, France
  • Alessandra Carbone, Sorbonne Université, France

Short Abstract: Overlapping genes exist in all domains of life and are especially abundant in viral genomes. The existence of overlapping reading frames increases the rising of deleterious mutations for one of the proteins, since a single nucleotide substitution may affect both proteins. Molecular coevolution may be seen as a mechanism to tolerate or compensate unfavorable mutations, decreasing the evolutionary constraints in the overlapping region. Although molecular coevolution was widely used in viral genomes, the “overlap problem” was disregarded. Due to a frameshift, a mutation in a protein may be coupled by one or two consecutive synonymous or non-synonymous substitutions in the other protein. In the overlapping region, coevolution in an ORF: may be mirrored by coevolution in the other ORF; may generate a non-synonymous substitution which in turn may be compensated by other mutations (inside or outside the overlapping region) or may generate synonymous substitutions. The different situations give information about the relative importance of a position for both ORFs. Here, we present a server that facilitates the analysis of coevolution in overlapped proteins and the impact of mutations in another ORF. To do that we combine information at protein and nucleotide levels. Coevolution analysis is carried out using BIS2TreeAnalyser.

K-65: G-quadruplexes evolution in transcriptomes
COSI: EvoCompGen COSI
  • Aida Ouangraoua, Université de Sherbrooke, Canada
  • Anaïs Vannutelli, Université de sherbrooke, Canada
  • Jean-Michel Garant, Université de sherbrooke, Canada
  • Jean-Pierre Perreault, Université de sherbrooke, Canada

Short Abstract: G-quadruplexes (G4) are secondary structures present in DNA and RNA. A G4 is defined by a stack of G-tetrads, which are plans composed of four guanines linked by Hoogsteen base pairs. Studies on RNA G4 (rG4) have shown their roles in many regulation of mechanisms like the translation, the alternative splicing, the polyadenylation, the mRNA location and the miRNA maturation. G4 can act as activators or as inhibitors on regulations. These effects depend on the transcript in which the G4 is located but also on the location of the G4 in the transcript, for instance in coding regions or non coding regions. The variety of functions for G4 makes it complex to fully understand rG4 functionality. Thus, studying rG4 evolution across several transcriptomes shall lead to a better understanding of G4 apparition and spread during evolution. Diverse species of each life domain have been chosen for this study of rG4 conservation. rG4 are predicted in transcriptomes of species. Then the distribution of predicted rG4 is analysed to assess their importance in transcriptomes, and also understand their evolution and involvement in cellular mechanisms.

K-66: Gene tree and transcript tree construction using splicing orthology groups
COSI: EvoCompGen COSI
  • Aida Ouangraoua, Université de Sherbrooke, Canada
  • Safa Jammali, Université de Sherbrooke, Canada
  • Esaie Kuitche Kamela, Université de Sherbrooke, Canada
  • Marie Degen, Université de Sherbrooke, Canada

Short Abstract: Recent studies have revealed that alternative splicing plays a major role in the diversification of transcript production by eukaryote genes. Because each gene has the ability to produce more that one transcript, this also leads to the proteome diversification. However, given a gene family, current gene phylogeny reconstruction methods make use of a single reference protein per gene to build gene trees. These methods also neglects the exon-intron structures of the transcripts that are also informative. We have developed a new model to group transcripts of a gene family into splicing orthology groups by accounting for transcript sequence and structure similarities. We then designed a reconciliation-based method for the construction of transcript trees and gene trees. The resulting transcript trees allow to locate events of transcript set modifications such as the apparition of new splicing isoforms or the loss of transcripts along branches of the gene trees. We applied our model for the correction of 1000 gene trees of the Ensembl database, in order to demonstrate the relevance of the gene tree correction method based on our model.

K-67: Whole genome sequencing and comparative genomics of Japanese Horned beetle
COSI: EvoCompGen COSI
  • Yuki Kagaya, Graduate School of Info. Sci., Tohoku University, Japan
  • Kengo Kinoshita, Tohoku University, Japan
  • Yuki Kaga, Graduate School of Life Sci., Tohoku University, Japan
  • Hiroaki Kuki, Graduate School of Sci. and Eng., Saitama University, Japan
  • Ryusuke Yokoyama, Graduate School of Life Sci., Tohoku University, Japan
  • Takeshi Obayashi, Graduate School of Info. Sci., Tohoku University, Japan
  • Masaaki Harada, Graduate School of Info. Sci., Tohoku University, Japan
  • Kazuhiko Nishitani, Faculty of Science, Kanagawa University, Japan

Short Abstract: Trypoxylus dichotomus septentrionalis Kono (Japanese horned beetle) is one of the largest beetles and widely distributed in Japan. The characteristic feature of this species is a long horn that only males have on the top of their head. Because of its distinctive form, it is a popular species in Japan especially among children as a pet animal. It is also noted that the beetles with some specific phenotypes are considered as highly valuable and dealt with at high prices. Here we performed the whole genome sequencing of the Japanese horned beetle using both Illumina HiSeq and Nanopore MinION. After the hybrid assembling using both reads, about 600 Mbp of the draft genome with a long mitochondrion DNA was obtained. Further analysis including gene annotations, characterization of the genome, and comparisons with related species to discuss evolutionary aspects of the beetle will be shown in our poster.

K-68: SeMPI 2.0: A web server for the genome-based prediction of the structure of Natural Products
COSI: EvoCompGen COSI
  • Paul Zierep, Albert-Ludwigs University Freiburg, Germany
  • Stefan Günther, Albert-Ludwigs University Freiburg, Germany

Short Abstract: Bacteria and fungi produce a variety of bioactive secondary metabolites and many of these natural products (NPs) are already applied as effective therapeutic agents. However, for the vast majority of them the molecular structure (and therapeutic potential) is not yet known. Since the basic scaffolds are normally synthesized by corresponding enzymes which are organized in gene clusters, the gene arrangement allows a structure prediction of the NPs from the genomic sequence. The open access web server SeMPI 2.0 provides a comprehensive prediction pipeline, which uses a genome sequence as input data and provides accurate structural predictions of the encoded NPs. Furthermore, the ab initio pipeline compares the predicted putative scaffolds with thousands of already annotated NPs and an estimation of the scaffold novelty is provided.

K-69: Orthology matters when inferring difficult phylogenies - an example on the Lophotrochozoa
COSI: EvoCompGen COSI
  • Adrian Altenhoff, ETH Zurich, Switzerland
  • Christophe Dessimoz, University of Lausanne, Switzerland
  • Jeremy Levy, University College London, United Kingdom
  • Magdalena Zarowiecki, Genomics England, United Kingdom
  • Bartłomiej Tomiczek, University of Gdansk, Poland
  • Alex Warwick Vesztrocy, University College London, United Kingdom
  • Daniel A Dalquen, University College London, United Kingdom
  • Steven Müller, University College London, United Kingdom
  • Maximilian J Telford, University College London, United Kingdom
  • Natasha M Glover, University of Lausanne, Switzerland
  • David Dylus, University of Lausanne, Switzerland

Short Abstract: Orthologous Groups (OGs) contain corresponding genes between species that evolved through a speciation event. In traditional phylogenetics a small number of OGs of highly conserved genes is used to infer a species tree, whereas in phylogenomics a large number of OGs is integrated. The prediction of these groups depends on the employed method and raises, thus, the question on how orthology prediction methods influence the inference of difficult species phylogenies. Here, we show a comparison of 5 major orthology prediction software packages and evaluate empirically their performance on the Lophotrochozoa dataset, a phylogeny that has stirred up some controversies in the community. First, we compare the different methods in terms of group size and number of informative sites using different thresholds of minimum number of species per OG. We observe that all methods show differences in the number of genes in corresponding OGs. Given a reference tree, we find that the with increasing number of OGs most methods become topologically closer to the reference. Moreover, although there is concurrence between the different trees originating from the 5 methods, there are no two trees that are topologically identical. Finally, we put all computed trees into context with supported branchings from literature.

M-67: Systems biology of hybrids
COSI: EvoCompGen COSI
  • Viera Kovacova, IBP, Uni of Cologne, Germany
  • Jeffrey Power, IBP, Uni of Cologne, Germany
  • Fernanda Pinheiro, IBP, Uni of Cologne, Germany
  • Simone Pompei, IBP, Uni of Cologne, Germany
  • Melih Yueksel, IBP, Uni of Cologne, Germany
  • Isabel Rathmann, IBP, Uni of Cologne, Germany
  • Mona Foerster, IBP, Uni of Cologne, Germany
  • Berenike Maier, IBP, Uni of Cologne, Germany
  • Michael Laessig, IBP, Uni of Cologne, Germany

Short Abstract: Horizontal gene transfer is an important factor in bacterial evolution that can act across species boundaries. We know little about rate and genomic targets of cross-subspecies gene transfer, and on its physiological and selective effects in the recipient organism. Here we address these questions in a parallel evolution experiment with two Bacillus subtilis subspecies of 6.8% sequence divergence. We observe the rapid evolution of hybrids by lateral gene transfer, and we show that these dynamics involve physiological and evolutionary adaptation. The genome-wide uptake of orthologous genes, together with insertions and deletions and de novo mutations, occurs at a straggering rate of (0.26 ± 0.04) % h-1 . Each recipient population replaces about 12% of its core genes, and 51% of core genes are replaced in at least one population. While evolved hybrids show a net loss of gene expression compared to the ancestral recipient population, we identify a set of genes whose upregulation is predictive of hybrid fitness. The co-occurrence statistics of orthologous transfers reveals a broad network of fitness epistasis between essential genes. Results show that gene transfer can bridge epistatic barriers between subspecies along multiple high-fitness paths. Cross-subspecies gene transfer rapidly navigates a complex fitness landscape.